1 About

1.1 Contributions

Please note that authorship is alphabetical. Contributions are listed below - see github for details and who to blame for what :-).

2 Intro

Exploring covid 19 and related data.

This tweet:

Suggested something strange was happening with overall USA deaths in early 2020.:

USA deaths plot

USA deaths plot

This very brief analysis attempts to re-create some of their results from the same data.

3 Data

Sourced from the downloads button of https://gis.cdc.gov/grasp/fluview/mortality.html

dt <- data.table::fread(paste0(myParams$dPath, "/USA-all-data.csv"))

t <- table(dt$SEASON)
kableExtra::kable(t, caption = "Seasons used") %>% kable_styling()
Table 3.1: Seasons used
Var1 Freq
2015-16 52
2016-17 52
2017-18 52
2018-19 52
2019-20 24

Note that the 2020 data is incomplete.

Before we go further we’ll convert week numbers (in the data) to standardised dates. This will mean some of the dates for the weeks are not exactly right but it might make for easier plotting.

# why can't they just be published with dates??

dt[, `:=`(date, ifelse(SEASON == "2015-16" & WEEK > 39, as.Date("2015-01-01") + lubridate::weeks(WEEK), 
    NA))]
dt[, `:=`(date, ifelse(SEASON == "2015-16" & WEEK < 40, as.Date("2016-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2016-17" & WEEK > 39, as.Date("2016-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2016-17" & WEEK < 40, as.Date("2017-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2017-18" & WEEK > 39, as.Date("2017-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2017-18" & WEEK < 40, as.Date("2018-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2018-19" & WEEK > 39, as.Date("2018-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2018-19" & WEEK < 40, as.Date("2019-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2019-20" & WEEK > 39, as.Date("2019-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, ifelse(SEASON == "2019-20" & WEEK < 40, as.Date("2020-01-01") + lubridate::weeks(WEEK), 
    date))]
dt[, `:=`(date, lubridate::as_date(date))]

# check coding head(dt[WEEK == 1])
t <- head(dt[WEEK == 1 | WEEK == 52, .(SEASON, WEEK, date)])

kableExtra::kable(t, captopn = "Check we coded the weeks correctly") %>% kable_styling()
SEASON WEEK date
2019-20 52 2019-12-31
2019-20 1 2020-01-08
2018-19 52 2018-12-31
2018-19 1 2019-01-08
2017-18 52 2017-12-31
2017-18 1 2018-01-08
# also clean up data where needed
dt[, `:=`(totDeaths, as.numeric(gsub(",", "", `TOTAL DEATHS`)))]
dt[, `:=`(pneuDeaths, as.numeric(gsub(",", "", `NUM PNEUMONIA DEATHS`)))]
dt[, `:=`(fluDeaths, as.numeric(gsub(",", "", `NUM INFLUENZA DEATHS`)))]

dt$V13 <- NULL
dt$V14 <- NULL
dt$V15 <- NULL
dt$V16 <- NULL
dt$V17 <- NULL
dt$V18 <- NULL

skimr::skim(dt)
Table 3.2: Data summary
Name dt
Number of rows 232
Number of columns 16
_______________________
Column type frequency:
character 7
Date 1
logical 1
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
AREA 0 1 8 8 0 1 0
AGE GROUP 0 1 3 3 0 1 0
SEASON 0 1 7 7 0 5 0
NUM INFLUENZA DEATHS 0 1 1 5 0 135 0
NUM PNEUMONIA DEATHS 0 1 5 5 0 220 0
TOTAL DEATHS 0 1 6 6 0 228 0
PERCENT COMPLETE 0 1 6 6 0 2 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2015-10-08 2020-03-18 2017-12-27 232

Variable type: logical

skim_variable n_missing complete_rate mean count
SUB AREA 232 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
WEEK 0 1 26.62 15.67 1.0 12.00 27.0 41.00 52.0 ▇▆▆▆▇
THRESHOLD 0 1 6.78 0.78 5.4 6.07 6.8 7.40 8.2 ▆▆▅▇▅
BASELINE 0 1 6.41 0.77 5.1 5.70 6.4 7.00 7.8 ▇▆▇▇▆
PERCENT P&I 0 1 6.56 1.12 4.9 5.70 6.2 7.32 10.9 ▇▅▃▁▁
totDeaths 0 1 53688.44 3467.98 37641.0 51073.50 53119.5 55809.75 67495.0 ▁▁▇▃▁
pneuDeaths 0 1 3380.11 591.94 2431.0 2924.50 3214.5 3782.50 5583.0 ▇▇▅▁▁
fluDeaths 0 1 167.95 275.90 2.0 13.75 31.5 257.50 1626.0 ▇▂▁▁▁
tdt <- dt[`PERCENT COMPLETE` != "> 100%"]
kableExtra::kable(table(tdt$date, tdt$`PERCENT COMPLETE`), caption = "Incomplete data...") %>% 
    kable_styling()
Table 3.2: Incomplete data…
79.10%
2020-03-18 1
dt <- dt[`PERCENT COMPLETE` == "> 100%"]

We remove the incomplete data from further analysis.

4 All deaths

Figure 4.1 is the one causing the interest. Note that we have set the y axis to start at zero to avoid over-emphasising the slope. Week 10 is roughly:

dt[WEEK == 10, .(SEASON, WEEK, date)]
##     SEASON WEEK       date
## 1: 2019-20   10 2020-03-11
## 2: 2018-19   10 2019-03-12
## 3: 2017-18   10 2018-03-12
## 4: 2016-17   10 2017-03-12
## 5: 2015-16   10 2016-03-11
week10Text <- paste0("Week 10: ", dt[WEEK == 10 & SEASON == "2019-20", (date)])
addWeek10Label <- function(p) {
    p <- p + annotate("text", x = 10.5, y = yMax * 0.4, angle = 10, size = 3, label = week10Text, 
        hjust = 0)
    p <- p + geom_vline(xintercept = 10, colour = "#CC79A7", alpha = myParams$vLineAlpha)
    return(p)
}

p <- ggplot2::ggplot(dt, aes(y = totDeaths, x = WEEK, colour = SEASON, group = SEASON)) + 
    geom_point() + ylim(0, NA) + labs(x = "Week", y = "Total deaths", caption = "Plot by @dataknut using data from https://gis.cdc.gov/grasp/fluview/mortality.html\nIncomplete data excluded")
yMax <- max(dt$totDeaths)
p <- addWeek10Label(p)
p
USA recorded deaths by flu season (all deaths)

Figure 4.1: USA recorded deaths by flu season (all deaths)

Figure 4.2: interactive version (hover the mouse over a dot).

plotly::ggplotly(p)

Figure 4.2: USA recorded deaths by flu season (all deaths)

5 Deaths from pneumonia

Figure 5.1 is pneumonia deaths. They are trending downwards too.

p <- ggplot2::ggplot(dt, aes(y = pneuDeaths, x = WEEK, colour = SEASON, group = SEASON)) + 
    geom_point() + ylim(0, NA) + labs(x = "Week", y = "Pneumonia deaths", caption = "Plot by @dataknut using data from https://gis.cdc.gov/grasp/fluview/mortality.html")
yMax <- max(dt$pneuDeaths)
p <- addWeek10Label(p)
p
USA recorded Pneumonia deaths by flu season

Figure 5.1: USA recorded Pneumonia deaths by flu season

The ‘missing’ deaths here don’t seem to account for the overall total deaths reduction. It seems that pneumonia deaths are not being re-classed as covid since USA covid deaths are not recorded as starting until after March 10th. Although it is possible that earlier covid deaths were not recorded as covid or pneuomina (or influenza).

6 Deaths from influenza

Figure 6.1 is influenza deaths.

p <- ggplot2::ggplot(dt, aes(y = fluDeaths, x = WEEK, colour = SEASON, group = SEASON)) + 
    geom_point() + ylim(0, NA) + labs(x = "Week", y = "Influenza deaths", caption = "Plot by @dataknut using data from https://gis.cdc.gov/grasp/fluview/mortality.html")
yMax <- max(dt$fluDeaths)
p <- addWeek10Label(p)
p
USA recorded Influenza deaths by flu season

Figure 6.1: USA recorded Influenza deaths by flu season

Yep, 2017-2018 was a bad flu year in the USA

7 Runtime

Report generated using knitr in RStudio with R version 3.6.3 (2020-02-29) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64).

t <- proc.time() - myParams$startTime

elapsed <- t[[3]]

Analysis completed in 5.524 seconds ( 0.09 minutes).

R packages used:

  • data.table - (Dowle et al. 2015)
  • ggplot2 - (Wickham 2009)
  • here - (Müller 2017)
  • kableExtra - (Zhu 2018)
  • lubridate - (Grolemund and Wickham 2011)
  • plotly - (Sievert et al. 2016)
  • skimr - (Arino de la Rubia et al. 2017)

References

Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.